Lip Localization and Viseme Classification for Visual Speech Recognition
نویسندگان
چکیده
The need for an automatic lip-reading system is ever increasing. Infact, today, extraction and reliable analysis of facial movements make up an important part in many multimedia systems such as videoconference, low communication systems, lip-reading systems. In addition, visual information is imperative among people with special needs. We can imagine, for example, a dependent person ordering a machine with an easy lip movement or by a simple syllable pronunciation. Moreover, people with hearing problems compensate for their special needs by lip-reading as well as listening to the person with whome they are talking. We present in this paper a new approach to automatically localize lip feature points in a speaker’s face and to carry out a spatial-temporal tracking of these points. The extracted visual information is then classified in order to recognize the uttered viseme (visual phoneme). We have developed our Automatic Lip Feature Extraction prototype (ALiFE). Experiments revealed that our system recognizes 72.73% of French Vowels uttered by multiple speakers (female and male) under natural conditions.
منابع مشابه
A Lip Localization Based Visual Feature Extraction Method
This paper presents a lip localization based visual feature extraction method to segment lip region from image or video in real time. Lip localization and tracking is useful in many applications such as lip reading, lip synchronization, visual speech recognition, facial animation etc. To synchronize lip movements with input audio we need to first segment lip region from input image or video fra...
متن کاملDecoding visemes: improving machine lipreading (PhD thesis)
This thesis is about improving machine lip-reading, that is, the classification of speech from only visual cues of a speaker. Machine lip-reading is a niche research problem in both areas of speech processing and computer vision. Current challenges for machine lip-reading fall into two groups: the content of the video, such as the rate at which a person is speaking or; the parameters of the vid...
متن کاملLip Localization and Viseme Recognition from Video Sequences
Viseme (visual cue) recognition is one of the steps to be followed in building an automated lip-reading system. In order to recognize a viseme, one has to first detect the lips of the speaker from the video sequences and track them to extract the feature vectors for the final recognition. A novel method for liplocalization based on the color models has been proposed. Also, the basic possible li...
متن کاملPrimary research on the viseme system in Standard Chinese
The study of traditional phonetics indicates the shape of lips takes important effect on the articulations of consonants and vowels. [1]. AVSP (Audio-Visual Speech Processing) can improve the naturalness of synthetical speech and recognition rate of the speech recognition system. Especially in computer-synthesized face, the movements of lip-shape play a crucial role. The present research aims t...
متن کاملUsing viseme based acoustic models for speech driven lip synthesis
Speech driven lip synthesis is an interesting and important step toward human-computer interaction. An incoming speech signal is time aligned using a speech recognizer to generate phonetic sequence which is then converted to corresponding viseme sequence to be animated. In this paper, we present a novel method for generation of the viseme sequence, which uses viseme based acoustic models, inste...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1301.4558 شماره
صفحات -
تاریخ انتشار 2006